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Abstract. Uncertainty quantification is a primary challenge for reliable modeling and simulation of complex stochastic 
dynamics. Such problems are typically plagued with incomplete information that may enter as uncertainty in the model 
parameters, or even in the model itself. Furthermore, due to their dynamic nature, we need to assess the impact of these 
uncertainties on the transient and long-time behavior of the stochastic models and derive corresponding uncertainty bounds for 
observables of interest. A special class of such challenges is parametric uncertainties in the model and in particular sensitivity 
analysis along with the corresponding sensitivity bounds for stochastic dynamics. Moreover, sensitivity analysis can be further 
complicated in models with a high number of parameters that render straightforward approaches, such as gradient methods, 
impractical. In this paper, we derive uncertainty and sensitivity bounds for path-space observables of stochastic dynamics in 
terms of new goal-oriented divergences; the latter incorporate both observables and information theory objects such as the 
relative entropy rate. These bounds are tight, depend on the variance of the particular observable and are computable through 
Monte Carlo simulation. In the case of sensitivity analysis, the derived sensitivity bounds rely on the path Fisher Information 
Matrix, hence they depend only on local dynamics and are gradient-free. These features allow for computationally efficient 
implementation in systems with a high number of parameters, e.g., complex reaction networks and molecular simulations. 
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1. Introduction. In this paper, we derive uncertainty and sensitivity bounds for path-space observables 
of stochastic dynamics in terms of suitable information theoretic divergences such as relative entropy rate 
(RER) and path-space Fisher Information Matrix (pFIM). Reliable modeling and simulation of complex 
systems often suffers from incomplete information that may enter as uncertainty in the model parameters, 
or even in the model itself. Here we develop an approach that provides uncertainty bounds for observables 
of interest in the transient and long-time behavior of the stochastic models. The bounds are expressed in 
terms of a new goal-oriented divergence that incorporates observables, as well as path-space information 
theory objects such as the relative entropy rate. The presented method also yields bounds on parametric 
sensitivity for stochastic dynamics, e.g., for solutions to stochastic differential equations. It is particularly 
useful in realistic stochastic models, for example, biochemical reaction networks, which are characterized by 
a high number of parameters that render classic sensitivity analysis approaches, such as gradient methods, 
impractical. We present sensitivity bounds that are computable and sufficiently sharp. 

Estimating sensitivity indices appears as a common task in many applications ranging from engineer¬ 
ing and financial mathematics to biochemistry. Methods that apply Monte Carlo simulations to estimate 
the gradients directly include finite-difference approximations combined with coupling methods [MJ [TJ [5] , 
likelihood ratio and Girsanov methods [niisa, polynomial chaos expansions m, path-wise methods [36) . 
linear response |12j . etc. In another direction, information-based sensitivity analysis approaches have been 
proposed as means to quantify the overall behavior of the system and not just the response of a specific 
observable function, namisD]. These sensitivity analysis methods employ information theory metrics such 
as the relative entropy (also known as the Kullback-Leibler divergence) as well as the Eisher Information 
Matrix (FIM). Moreover, taking into account that the stationary distribution is rarely known in complex 
stochastic dynamics, these information-based methods resort either to linearized Gaussian approximations of 
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?he underlying process m. or they rely on path-space objects such as the relative entropy rate and the path 
Fisher Information Matrix, [301131]. The latter approach is exact since no approximation is necessary. It is 
also gradient-free in the sense that simulation for a single model (parameter) yields bounds for all parameter 
perturbations. 

Overall, gradient-free sensitivity analysis methods such as the ones based on pFIM, [m misni E] are 
highly appropriate for systems with a high-dimensional parameter space since they allow for an efficient 
exploration of the parameter space without the calculation of a very high number of directional derivatives. 
In the stochastic dynamics setting, the bounds we present avoid expensive Monte Carlo simulations of 
sensitivity indices by providing error bounds for them. The derived bounds are based on the path-space FIM 
and are obtained from different inequalities and representations of relative entropy. It is also desirable to 
provide bounds based on Fisher information because the (static) FIM is a tool extensively utilized in optimal 
experimental design, as well as in statistics, for estimation, identifiability, etc. Moreover, in order to obtain 
the tightest possible bounds, it is crucial to find the optimal constant that multiplies the Fisher information 
in these inequalities. 

The presented results rely in part on an upper bound derived recently in [7] and a companion lower bound 
in m, for functionals of probability measures P £ 'P(fl) and Q £ P{^), where Q is viewed as the “true” 
probabilistic model, and P is a computationally tractable “nominal” or “reference” model, e.g., a surrogate 
model. In this paper we start our analysis by showing that these inequalities, for bounded observables / of 
random variables with probabilities P and Q, can be rewritten in the form 

S_(g,P;/)<EQ[/]-Ep[/] <S+(Q,P;/), (I.l) 

where E+{Q, P; f) > 0 {E-{Q,P-, f) < 0), and E±{Q, P; f) = 0 if and only if P = Q or / is deterministic 
a.s. with respect to P. Due to these properties, S+(g, P;/) > 0 (and —S_(g, P;/)) is a goal-oriented 
divergence, incorporating in the definition the observable /. Furthermore, {Q, P; /) depend on the relative 
entropy of Q with respect to P and it admits an explicit representation (see Theorem 12.81) . We view these 
weak error bounds (i.e., errors in averages or expected values for various classes of functions) as Uncertainty 
Quantification (UQ) bounds for the observables of interest /. Furthermore, the UQ bounds (11.11) characterize 
the errors incurred if one uses the more computationally tractable Ep[/] instead of Eq[/]. As it is discussed 
in Section 121 UQ bounds of the type (11.11) can be derived from different divergences used to discriminate 
between two probability measures P and Q. For example, a common choice is based on the Csiszar-Kullback- 
Pinsker (CKP) inequality, which bounds the total variation norm by the relative entropy. Another approach 
uses x^-divergence (or Pearson divergence) and derives a bound by a direct application of Cauchy-Schwarz 
inequality. The bounds presented in this paper are based on the variational characterization of relative 
entropy used in |7] . The variational approach guarantees optimal constants in the estimates and thus tighter 
bounds. 

In the context of parametrized models the general UQ bounds dm give a tool for estimating sensitivity of 
observables to perturbations in model parameters. More precisely, given a parametric family of probability 
measures P^{dui), 9 £ on the common measurable space we study bounds on perturbations 

of Epfl [/] under changes of 9. The bounds on sensitivity indices for parametric model families P® then 
follow by asymptotic expansions of Q = in e, which is a straightforward procedure under smoothness 

assumptions when the parameter is finite dimensional, i.e., 9,v G R.^ and |?;| = 1. The derived sensitivity 
bounds can be viewed as sharp and computable bounds for the weak error of bounded and continuous 
functions in cases when the measure P^ is approximated by under assumptions of smoothness on 

the mapping 9 i—>■ P®. The mapping defines a finite dimensional submanifold parametrized by 0 £ 
of the manifold of probability measures P(U) on D. For the sensitivity indices defined by S'/,„(P®) = 
limj-^oo 7 (Epe+£„ [/] — Epa [/]) , we establish estimates of the type 

\SfAP^)\ < x/Varpa(/)^u^P(P®)^;, (1.2) 

where I(P®) is the FIM of P®. It is worth noting the decomposition of the right hand side of the above 
sensitivity bound into the product of two terms, with each term capturing different aspects of the sensitivities. 

A primary novelty of the presented results is their application to cases where the model is represented 
by a path measure for a Markov process. Thus the proposed UQ and sensitivity bounds are also applicable 




for the weak error of path-dependent quantities. With stochastic dynamics in mind, we consider a stochasti§ 
process {Xt}t>Q with the stationary measure and a process {Yt}t>o with the initial measure v^dx), 

and we denote by P = P[o,t]j Q = Q[o,t] the respective measures on the path space. As previously, Q[o,t] 
is viewed as the “true” measure while P[o,t] as the “nominal” model. We consider as an observable a 
measurable functional P({Alt}o<t<T) of the process. The derived UQ bounds are now set on path space and 
characterize the errors incurred when approximating Eqjq [P] by [P] 

2-(Q[o,T],^’[0,T];-^) < EQ[0.t][-^] - 1EP[0.t] I-^] < '^+{Q[0,T], P[Q,T]] X) . (1.3) 

Even though the path UQ bound in (11.31) is a direct consequence of (11.11) it can be further elaborated using 
properties and asymptotics of the relative entropy between path distributions, such as the relative entropy 
rate (RER), denoted by T-L{Q\\P)^ which measures the information loss per unit time (for a definition of 
RER see (13.11) 1. The RER for large classes of stochastic dynamics is a computable quantity, [HD], implying in 
turn that the bounds in na) are computable using Monte Carlo simulation. Furthermore, in a calculation 
reminiscent of the Gartner-Ellis Theorem, we show that the bounds (11.31) in the T —>■ oo limit take the form 

S±(g||P;P) = G±^(p(g||P)), (1.4) 

where Gp_p(0) = 0 and the function Gpp is calculated in terms of the cumulant generating function of the 
observable T under the model P. Finally, (El demonstrates the key role played by the relative entropy 
rate ^{QWP) for uncertainty quantification of stochastic processes and in general for models with correlated 
data. These bounds further justify the sensitivity analysis based on relative entropy rate and coarse-graining 
methods developed in [30] and respectively. 

An implication of the path UQ bounds (11.31) is that sensitivity analysis bounds which are general and valid 
in both transient and long-time regimes are possible. In particular, when assuming a parametric family of 
path distributions parametrized by 0 G and for time-averaged observables of the form P{{Xt]o<t<T) = 
y f{Xs) ds, we obtain sensitivity bounds such as (11.21) for both transient and long-time regimes. For 
example, for the stationary distribution p® (unknown for most stochastic dynamics models) we have the 
bound 


\SfA^^O)\<.Mf)^v^In{Pnv, (1.5) 

where t(/) is the integrated autocorrelation time (lAT). In Monte Carlo simulation, the calculation of lAT 
is a necessary step since it provides the variance of the estimated observable, P, [24]. Furthermore, Th(P®) 
is the path FIM which corresponds to the Hessian of the RER. The path FIM is also computable for large 
classes of stochastic dynamics; for example, for chemical reaction networks the path FIM is a sparse, block- 
diagonal matrix, hence all related computations scale linearly with the dimension of the parameter vector 
e, [S]. Therefore, the path FIM is computationally feasible, even for systems with a very high-dimensional 
parameter space. For completeness in the presentation, we refer to Appendix [^ for the RER and the path 
FIM formulas for various classes of Markov processes. 

We present several examples of the derived sensitivity bounds and their tightness is demonstrated. In 
particular, for the exponential family of distributions, the sensitivity bound becomes an equality, showing 
the sharpness of the bounds. Additionally, we compare the “static” and the path-space sensitivity bounds for 
simple Markov processes where the stationary distribution is explicitly known. We note though that for non¬ 
equilibrium steady state systems the stationary distribution is generally not known, therefore, comparisons 
are not feasible and only the path-space sensitivity bound can be computed. 

2. Uncertainty quantification information inequalities and sensitivity bounds. 

2.1. Distances and divergences of probability measures. Bounds of the type ED are based on 
characterizing a distance or divergence between the measures, Q, P, under which the averages are evaluated. 
While our primary goal is to characterize the bounds based on relative entropy, other divergences can be 




ilso used to derive similar bounds with different levels of sharpness. 

Definition 2.1. The total variation norm between two probability measures Q and P on is 

defined by 


\\Q-P\\tv = sup \Q{A)-P{A)\. 
AeB 


( 2 . 1 ) 


We also consider two pseudo-distances, or divergences in the statistics terminology. 

Definition 2.2. For two probability measures Q, P on (11, S) the relative entropy (information diver¬ 
gence, Kullback-Leibler divergence) of Q with respect to P is defined by 


-7? /nil Pi - ifQ^P and ^ log is P-integrable, 

I - 1-00 otherwise. 

( 2 . 2 ) 

The Kullback-Leibler divergence is a particular case of a family of Csiszar (/)-divergences which are functionals 
of the form 


n^{Q\\p) = 


^/■A(3?(" 

))p(dw), ifQ<Pand(/)(^) 

\-l-oo 

otherwise, 


(2.3) 


for a convex function (p : IR+ —>■ K with = 0. In the case of the relative entropy we have (j){x) = a:logx. 

Another choice of the convex function, (j){x) = (x — 1)^, gives a member of the (/)-divergence family known 
as x^-divergence. 

Definition 2.3. The 'divergence of two probability measures Q, P on is defined by 

|/(^(‘^)-l) Piduj), ifQ-^P, ^2.4) 

+00 otherwise. 


2.2. Information inequalities and goal-oriented divergence. We turn to a variational formulation 
that provides sharp weak error estimates in terms of relative entropy. Let M{fl) denote the measurable 
functions from 11 into R and let Alb(ll) be the subset of functions that are uniformly bounded. For / G A4b(fl) 
and c G M we introduce the cumulant generating function (logarithmic moment generating function) 

Apj(c) =logEp[e°'^] = log J e'^ddp (2.5) 

We restrict our analysis to the functions / for which Apj(c) is finite at least in a neighborhood of the origin. 
More specifically, we have the following definition of the set E. 

Definition 2.4. A function f G AIf,(H) belongs to the set E if and only if there exists cq > 0 such that 
^A/(=*=co) < oo. 

The properties of Apj then guarantee that Apj{c) is finite for all c G [—co,co]. We note that Ep[|/|] is 
finite for aW f G E. It will be more convenient to work with the cumulant generating function of the centered 
observable / = / — Ep[/]: 

Apj(c) = logEp[e=('^-*=^['^l)] = log J dP. (2.6) 

Recalling the basic properties of the cumulant generating function for f G E that is not essentially 
constant, we have that Apj(-) is a strictly convex function which is C°° in a neighborhood of the origin, with 
the derivatives Ap^^(O) defining the cumulants of / — Ep[/] under P. In particular, Apj(O) = Ap ^(0) = 0 
and Ap^(O) = Varp(/). The following characterization of exponential integrals is well-known in statistics 
and large deviation theory (see e.g., 0)- For the sake of completeness we present it here together with a 
proof. 


Lemma 2.5. Let f € Mbi^l) and P be a probability measure on (0,S). Then 
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log Ep [e^] = sup {Eg [/] - 7^ (Q 11 P)} . 


(2.7) 


Proof. It suffices to consider only Q such that TZ{Q\\P) < oo in (12.71) . Let the probability measure R 
be defined by dR/dP = e'^/Ep[e^]. If 77. (Q 11 P) < oo, then Q P implies Q R. Thus 


-77(Q||P)+Eq[/] = -Eq 


= -Er 


, I dQ 

1 ( 


EqI/1 


-E, 


Q 


1 / 


+ lEo[/l 


= -77 (Q II P) + logEp[e-^]. 


Now use that TZ{Q\ \ R) >0 and TZ{Q\ \ R) = 0 if and only if Q = P [51 Lemma 1.4.1]. This establishes (12.71) 
and also shows that P is the supremizing measure. 0 

By changing / to c(/ — Ep[/]), we obtain a variational formula for the cumulant generating function 

Apj { c )= sup {c(Eq[/]-Ep[/])-P(Q||P)} . (2.8) 

Q^P 


The variational characterization gives us the following upper and lower bounds for / G Mb(Ll) and c > 0: 

Eq[/]-Ep[/] < ilogEp[e^(^-^^[^l)] + ip(Q||P) , (2.9) 

EqI/I - Ep[/] > -i logEp[e-^(^-^^[^l)] - -P (Q II P) . (2.10) 

c c 

These inequalities can be extended to any f G £, and we give the argument for the case of the upper bound 
(12.91) . Recall that f G £ implies Ep[|/|] < oo. If Ep[e“(-^“'®^[-^l)] = oo, then (12.91) holds automatically. If 
< oo, let = [/ V (—a)] A 6 for a, 6 G R, and apply (12.91) with / — Ep[/] replaced by 
fa,b _ Ep[y]. First let a —)► oo and use the Monotone Convergence Theorem, and then send b ^ oo and use 
the dominating function e‘^^d-'^plf]) to obtain (12.91) as written. 

Using these inequalities, tight estimates as in Chowdhary and Dupuis, [7], and Li and Xie, m can be 
obtained by optimizing over c > 0 

sup |-iAp./(-c) - ip (Q II P)| < Eq[/] - Ep[/] < inf |iAp./(c) + ip(Q || P)1 . (2.11) 

c>0 ( C C J C>0 C C J 

We refer to upper and lower bounds of this form as Uncertainty Quantification Information Inequalities 
(UQll). The corresponding bounds define a new type of divergence between probability measures P and Q 
as well as the observable /, hence we refer to it as a Goal-oriented Divergence. More precisely, based on 
(12.111) we give the following definitions. 

Definition 2.6. For any two probability measures P and Q with TZ{Q\\P) < oo and any observable 
f G £, we define 

S+(Q||P;/) = inf |iApj(c) + ip(Q||P)| , (2.12) 

and similarly 

E_{Q\\P;f) = sup\--Apj{-c)--TZ{Q\\P)\ . (2.13) 

oo L c c J 

Then the UQIIs (|2.1ip are rewritten as 

S_(Q II P; /) < Eq[/] - Ep[/] < S+(Q II P; /). 


(2.14) 












next show that S+ (Q 11 P; /) and —S_ (Q 11 P; /) have the properties of a divergence similar to the relative 
entropy and the y^-divergence. However, the new goal-oriented divergence additionally captures the role 
of fluctuations of the observable /, as is further quantified in Theorem 12.71 and Theorem 12.111 below. More 
specifically we have: 

Theorem 2.7 (Goal-oriented Divergence). Assume that f € £ and TZ{Q || P) < oo. Then 

(i) S+(Q||P;/) > 0 andE_{Q\\P;f) < 0, 

(ii) S±(Q 11 P; /) = 0 i/ and only if Q = P or f is constant P-a.s. 

Proof. The proofs for and S_ are similar and therefore we prove only the former case. 

(i) The proof uses the fact that both terms in the variational definition of S_|_, 

S+(Q II P; /) = inf ( hpjic) + Ip (Q || P) 1 , 

oo ( C c J 


are non-negative. The relative entropy TZ{Q \ \ P) is a divergence hence always non-negative, and thus 
ip (Q II P) > 0 for all c > 0. Furthermore, by Jensen’s inequality, 


-Ap f(c) = - logEp 
c ’ c 


,c(/-Ep[/]) 


> =Ep[/-Ep[/]] =0. 


(ii) If / = Ep[/] then Apj(c) = 0. Since TZ{Q\\P) G [0, oo), 


E+(Q||P;/) = inf -P(Q||P) =0. 

c>0 I C 


If Q = P then P (Q 11 P) =0 and 


0< S+(Q||P;/) = inH -Ap,/(c) ^ < lini-Apj(c) = A'pj(O) =0. 

C>0 I C ' n ’J 


1 


c-^O C 


For the reverse direction we can assume TZ{Q\\P) > 0, since if P(Q || P) = 0 the conclusion is automatic. 
In this case the infimum must be obtained in the limit c —> oo, so that limc_>oo Apj(c)/c = 0. We claim 
that Apj(c) = 0 for all c £ [0, oo). Since Apj(O) = 0, if Apj(c) > 0 for some c G (0,oo) then Apj(c) > 0 
for some c G (0,c]. Convexity then implies lim infc_).oo Apj(c)/c > Ap j(c) > 0, and this contradiction 
establishes Ap,/(c) = 0 for all c G [0,oo). Since / G f implies Apj(c) is twice continuously differentiable at 
c = 0, Ep [(/ — Ep[/])]^ = Ap^(O) = 0, and therefore / = Ep[/] P-a.s. □ 

Furthermore, we derive an analytic formula for the divergences 5±(Q || P; /): 

Theorem 2.8 (Representation). If f € £ with f ^ Ep[/] P-a.s. and P(Q || P) < oo then we have 

S+(Q||P;/)=A'p,^($-i(P(g||P))) and E_{Q\\P-J) = A'pj{- ^-\n{Q\\P))), (2.15) 

where 


$(c) := -Apjic) + cA'pjic) 

is a strictly increasing function on (0, c), and where c = sup{c : Apj(c) < oo}. 

Proof. Let 0+(c; p) = ^Apj{c) + where p^ = P (Q || P). Then 

2-e(g||-P;/) = inf 0+(c;p). (2.16) 

c>0 

We use that Apj{0) = Ap j(O) = 0, and that / ^ Ep[/] P-a.s. implies Apj is strictly convex. If p 7 ^ 0 
then 0-|_(c;p) tends to 00 as c } 0 and as c f 00 . Hence the infimum is achieved. Suppose an infimum of 
A > 0 is achieved at two points 0 < ci < C 2 < 00 , so that Apj{ci) p^ = CiA, i = 1,2. If c = (ci -h C 2 )/ 2 , 
then the strict convexity of Apj implies Apj(c) p^ < cA. This contradicts the minimality of Ci, and 
thus shows the minimizer is unique. Since Ap,/( 0 ) = Ap j( 0 ) = 0 we can continuously extend the function 
0+(c, 0) to c = 0 by 0+(O, 0) = 0. Then by direct calculation and lower semicontinuity the optimization 






problem in (I2.15|) extended to c > 0 has the unique minimizer c*(0) = 0 with the minimum value equal to cZ 
Then infc>o 0+(c; p) is well defined and achieves the infimum for all p G R. Since Apj{-) is a proper convex 
function and C°° in its domain of finiteness we have, for all p £ R, the optimality condition 

-^Ap,p(c) + iA'p,/c)-lp2=0. (2.17) 

Multiplying (12.1711 by c^, we obtain that the minimizer c* = c*(p) satisfies 

-Apjic)+cA'pjic)=p\ (2.18) 

We will use that Apj(c) is a log moment generating function with Apj(c) < oo for c in an open neighborhood 
of zero and A'p j{0) = 0. These imply that if A*{t) is the Legendre-Fenchel transform of Apj, i.e., A*{t) = 
sup,,>Q{ct — Apj(c)}, then A*{t) has its unique minimum at t = 0, and A*{t) —> oo as t —>■ oo. If t(c) is the 
unique solution of A* (t) = c, then it follows from convex dnality that 

$(c) = -Apj(c) + cA'pj{c) = A*{t{c)) 

is strictly increasing and maps (0,c) onto (0, oo). Therefore, from (12.181) we have 

c* = c*(p) = $-i(p2). (2.19) 

Substituting in (12.161) and using (I2.18p . we have that 

E+iQ\\P-J) = e+{c*{p);p)=A'pj{c*{p))=A'pj{^-\p^)) . (2.20) 

The representation of the lower bound S_((5 \ \ P; f) = ^'p f{ — ^~^{p^)) is computed in a similar way. □ 
From the proof above we deduce that the dependence on the cumulant generating function of / can be 
removed if a bound is available. Note that if d' : R —>■ R is convex with a minimum of zero at the origin, 
then in the definition of 'I'*(t), its Legendre-Fenchel transform, the supremum can be restricted to (0,oo). 

Corollary 2.9. Let 'I' : R —>■ R &e a convex and continuously differentiable function such that ^*(0) = 
'I''(0) = 0 and 

Apj{c) = log]Ep[e'^^^“®^f^]^] < 'I'(c), 

and define 'I'?|_(t) = ('!'![_)“^(t) as the (generalized) inverse of the Legendre-Fenchel transform 'I'*(t) = 
sup„,>o{ct — ^'(c)} of the function 'f'. Then 

EQ[f]-Ep[f]<¥p{n{Q\\P)). (2.21) 


We end this section by relating the derived bounds to existing information-theoretic inequalities. The 
Csiszar-Kullback-Pinsker inequality states that (for proofs see, e.g., [38]) 

\\Q - P\\tv < V2T^Q\\P) . (2.22) 

Using IIQ —P|1 tv = suP||j||^<i{Eq[/] —Ep[/]} and the Csiszar-Kullback-Pinsker inequality (12.221) we obtain 


|Eq[/] -Ep[/]| < WfW^y^mmp)- 


(2.23) 


The constant in front of the pseudo-distance can be improved by using the y^-divergence instead of the 




dP 


and applying the Cauchy-Schwarz 


relative entropy. Observing that |Ep[/] — Eq[/]| = / / ^1 
inequality to the right-hand side we have, for P, Q two probability measures on {Ll,B) with Q P and 
f€Mb{n), 


|Ep[/] -Eq[/]| < v'Varp(/)v'x2 (Q||P). (2.24) 

However, this bound is weaker than the new derived bound derived, (I2.14|) . since in general TZ{Q\\P) < 






















° 2.3. Linearization of the UQ bounds. The UQ bounds (12.141) and the representations (I2.15|) can 

be made more explicit in terms of the asymptotic expansion dX TZ{Q\\P) = 0, i.e., when Q is a perturbation 
P. We first prove an asymptotic expansion for the solution of the optimization problems in (12.111) . 

Lemma 2.10. For two probability measures P, Q on set = TZ{Q || P). Assume < oo and 

that f G £ with f ^ Ep[/] P-a.s. Then there exists a function c*{p) which is the unique solution of 

(P+) inf |iApJ(c) + i7^(g||P)j 

C>0 C C J 

as well as 

(P_) sup|-iApj(-c)-ip(g||P)) . 

oo L C c J 

Furthermore, there is po > 0 such that the optimal solution c*{p) is C°° in (0,po) admits the expansion 

c*{p) = ctp + 0{p^), 

where 


2 

Varp(/) ■ 

Proof We first solve (P+). Let 0+(c;p) = ^Apj{c) + Following Theorem 12.81 we obtain the 
optimality condition (12.171) . Multiplying (12.171) by c, we define 

G{c, p) := —-Apj(c) + h'pj{c) — —p^ . (2.27) 

Next, we apply the Implicit Function Theorem at c = 0, p = 0 as follows; first we have that 

-^G{c, 0) = -Apj{0) + 0{c ), 

and thus obtain 

= Varp(/). 

Since Varp(/) > 0, by the Implicit Function Theorem there exists a unique solution c*{p) > 0, c*(0) = 0 
of G{c,p) = 0 (and thus of (12.171) ') and c*(p) £ G°° for p in a neighborhood of the origin. Differentiating 
G{c*{p), p) = 0 and setting p = 0 yields terms in the Taylor expansion of c*{p). In particular, using the 
notation c* = dc*/dp, we have c*(p)Ap^(c*(p))c*(p) = 2p, and thus by setting c*(0) = limp_>o+c*(p) we 
have (c*(0))^ = 2/Apj(0), which concludes the proof by observing again that Varp(/) = Apy(O). 

To prove that c* (p) is also the solution of (P_) we observe that 

sup|-iApj(-c) - -TZ{Q\\P)\ = - inf |iApj(-c) + ip(g||P)l , 
oo I c c J oo (c c J 

and using the same arguments as for (P+) we conclude that the unique solution is obtained as the solution 
of the optimality condition 

~ 0 ’ c> 0 , 

which is, under the change of the variable c —>• —c, the same as (12.171) and thus analogous calculations yield 
the result. □ 

Next, substituting the expansion in p for the optimal value ()2.26l) we obtain asymptotics in p^ = 
TZ{Q\\P) of the upper and lower bounds for the UQ error (I2.14p . 

Theorem 2.11 (Linearization). Under the assumption that f G £ with f ^ Ep[/] P-a.s., we have 



(2.25) 

(2.26) 
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(i) the asymptotic expansion ^±((3 || ^’; /) = ±-\/Varp(/)-\/27?. (Q || + 0{TZ {Q || P)), and, 

(ii) an estimate of the weak error 

|Eq[/]-Ep[/]|< + (2.28) 

If needed, the term 0{TZ{Q\\P)) can be further resolved using the asymptotic expansions of c*{p) and 
0±(c; p) defined in Lemma 12.101 in terms oi p^ ='1Z{Q\\P). 

Proof. The proof follows from the Taylor expansion of (12.151) in p, where =TZ{Q\\P), around p = 0. 
First, we note that Apj(O) = Ap^(O) = 0 and Apy(O) = Varp(/). Therefore $“^(0) = 0, and the upper 
bound becomes 

5+(Q II P; /) = A'pj{^-\p^)) = A'pjiO) + A",^(0)<i>-i(p2) + Oi\^-\p^)\^) . 

We conclude using (12.191) and the expansion (I2.25p . □ 

2.4. Sensitivity bounds and perturbation analysis. In this section we consider a smooth para¬ 
metric family of probability measures P®, 0 S and assume that the following (mild) condition. 

Condition 2.1. There is a fixed reference probability measure R € P(f^) such that P^ R for all 
9 £ K^. Letp^iw) = ^^(w). Then there is a measurable set N such that R{N) = 0, and such that for 
all uj ^ N the mapping 9 —>■ p^(uj) from to (0, oo) is . Where needed, we also assume the existence of 
suitable dominating functions for various functions of p^. 

Under Condition 12 .1 1 the relative entropy can be expressed as 

7^(pe+.||pe) ^ /■/+-(c,)log^!^^P(dw). 

J P M 

Using the Taylor expansion and the fact f[dg^ logp®(a;)] p®(a;) R{duj) = 0, we have the perturbative expansion 
p(p«+.||p«) ^[de,p\u;)][doy{u;)]R{du;)+0{\vf). (2.29) 

The leading term in this expansion is a quadratic form defined by the FIM 

= J ^^^[9e,p\i^)][dey{uj)]R{duj)=-J[dlgAogp‘^{uj)]p'^{uj)R{duj). (2.30) 

We apply the derived bounds of Theorem [2TT] for the weak error in order to obtain bounds on the sensitivity 
indices (when the derivatives exist): 

5y,,(P^) = limi(Ep.+..[/]-Ep4/]). (2.31) 

Lemma 2.12. Assume Cond,ition \2.1\ and let v G 
fij Then 

P(p«+.||p«) ^ l^P(pS)^^.p^^^.p0(|^|3)_ (2.32) 

'ij 

(a) Assume also that f £ £ and thus the cumulant generating function Ape j{c) = logEpe 'Ep/)j 

in a neighborhood of the origin, and that f ^ Epe[f] P^-a.s. Then for v £ and e in a neighborhood of 

the origin there exists a function c*(e) which is the unigue solution of 

(P+) inf (iAp.,/(c) + -n (P^+- II PO)] , 
c>0 (C C J 

as well as 

(P_) sup|-iAp.,;(-c)-ip(P"+™||P®)) . 

c>o L ^ ^ J 



















^^rthermore the function c*(e) admits the perturbation expansion 

c*{e) = c\e + 0{e^), 

where 

Vaipe (/) 

Proof. The claim in (i) follows from (j2.29l) and (I2.30|) . The claim in (ii) follows directly from LemmaETO] 
after expanding the relative entropy in e, i.e., writing p^(e) = Si? Substituting in 

(12.251) and (|2.26D we obtain (12.331) and (I2.34|) . □ 

As a direct consequence of Theorem 12.111 we obtain a bound on the sensitivity indices by substituting 
c*(e) from (12.331) into 0±(c, p) (see the proof of Lemma 12.101) . 

Theorem 2.13. Under the assumptions of Lemma \2.1‘A it holds that for v G R^' and e ^ 0 

1 [/] - Ep. [/]I < v/Varp.(/)^^I(P«),,n.u, + 0(e), (2.35) 

and 



(2.33) 

(2.34) 


|5/,„(P®)| < VVarpo(f) 



(2.36) 


We refer to the inequality (12.361) as a sensitivity bound of the sensitivity index S'/^„(P®). 

Remark 2.1. The bound (12.351) on the senitivity index is a direct consequence of more general non¬ 
infinitesimal bounds such as Theorem 12.111 We note that in the special case of sensitivity analysis, where 
we consider small perturbations in the parameter space, we can obtain sensitivity bounds of the same form 
as ()2.36[l directly from the Cauchy-Schwarz inequality: 




^Ep+™[/] 


Jifico) - Ep. [/]) log/+^“(a.)) P\duj) 

< ^J{f{uj)-Epe[f]yp'^{duj)\^J ^^logp®+™(a;)^ P®(dw) 


(2.37) 


= v'Varpfl(/) /^I(P®)y?;i?;j . 


Finally, we can also use (I2.24|) applied to P® and P^+^'’ and obtain the same bound as in (12.361) . 

3. Path-space UQ information inequalities and sensitivity bounds. In this section we develop 
new uncertainty quantification information inequalities and related sensitivity bounds for stochastic processes 
and their path-dependent observables. The approach developed in the previous section is applicable to 
obtaining similar bounds for functionals of Markov processes, when combined with path-space Information 
Theory tools such as the RER and the associated path FIM. These concepts which are discussed next were 
introduced as UQ and sensitivity analysis tools for stochastic processes in [sniEiiii]. 

3.1. Information theory metrics in path space. We consider stochastic processes which are Markov 
and take values in Polish space A, although a much more general set up is also possible, see for instance 
[22] . For simplicity in the presentation, we further restrict our discussion to discrete-time Markov processes 
{AtjfgNo where Nq = N U {0} with the transition kernel p{x,dy) and with the initial measure pi{dx), and 
the Markov process {Y'tjfgNo with the transition kernel q{x, dy) and with the stationary measure ^{dx). For 
the time interval 0,I,...,T, we denote by P[o,t], Q[o,t] the respective probability measures on path space. 
Similar notation and constructions for all concepts introduced here will also be used when t G [0, oo), we 
refer to the Appendix A, as well as to [22l 1^ . 


















































11 


We will assume conditions under which the path-space relative entropy 

(Q[0,T] II P[0,T]) 

is finite for all T < oo. For stationary Markov processes, the relative entropy scales linearly in T as T —>■ oo, 
|22) . Thus it is natural to define the concept of the rate of the relative entropy between path distributions. 

Definition 3.1. Let P[o^t] o,nd Q[o,t] path-measures corresponding to Markov processes {WjtGNo; 
{YtligNo- define the relative entropy rate by 

H(Q||P)=^l^i7^(Q[o,T]||P[o,T]) , (3.1) 

when the limit exists. 

Although RER is a quantity between path distributions, we drop the dependence of time interval in the 
notation of the RER because RER is a time-independent quantity. Moreover, the relative entropy rate can 
often be expressed explicitly, which we demonstrate via examples in Appendix A. For instance, in the case 
of discrete-time Markov Chains we have 

■^(<311-^)=/ f q{x, dy) log \ {y) = [ TZ {q{x, ■)\\p{x, ■)) iy{dx). (3.2) 

Jx Jx dp[x,-) Jx 

The significance of the definition of RER is elucidated by the following property of the relative entropy 
of two path-measures for stationary processes. We state it for simplicity in the case of discrete-time Markov 
Chains, in which case it follows from the chain rule for relative entropy. For the proof we refer to Appendix 
A. The proof was first given by Shannon in |35j and since then has been extended in various directions for 
Markov and semi-Markov processes, |22) . 

Lemma 3.2. Let he two stationary Markov chains with the path-measures P[o,t] 

Q[o.t]- Suppose that v is a stationary distribution for {Ft} and that the initial distribution p, of {Xt} is 
arbitrary. Then for any T G Nq 

n{Q[o,T]\\P[o,T]) =Tn{Q\\p) + niu\\p) , (3.3) 

and the relative entropy rate TliQ || P) is independent ofT and given by kS.2i) . 

As in Section [231 we will consider the sensitivity analysis problem, but this time in the context of both 
transient and stationary dynamics. This amounts to an asymptotic expansion of the relative entropy, and 
eventually the RER, in terms of a parameter perturbation. First we consider the path-space probability 
measure Pj® where 6 G M.^ is a vector of the model parameters. We consider a perturbation v G 
in the parameter vector 0 and the resulting path-space probability measure Pjq^j . We start out with two 
regularity conditions on the dependence of probability measures on the parameter 6; these conditions are 
not the weakest possible, but they are fairly simple to state. 

Condition 3.1. There is a fixed reference probability measure R G P{X) such that ¥^{x,dy) R{dy) 
for all X G X and 0 G R^. Let p^{x,y) = ^ iu)- Then we assume {x,y,9) —?► p^{x,y) is continuous 

and for each fixed x,y that 8 —> p^{x,y) is . Where needed, we also assume the existence of suitable 
dominating functions for various functions of p^. 

Note that under this assumption, any stationary distribution /i® will be absolutely continuous with 
respect to R. It also holds that Pj® is absolutely continuous with respect to the product measure on 

with marginals R, with a smooth (C^) Radon-Nikodym derivative. Condition o is necessary for the 
sensitivity results in finite and long times and it is directly verifiable, since it depends only on the local 
dynamics p^(x,y). However, for some of the results presented here for infinite times, we additionally need 
Condition 13.21 below, which is a regularity condition for the stationary measure of the process Pj® . 
Whenever this measure is analytically available, e.g., as a Gibbs measure, this condition is checkable directly. 
However, typically the stationary measure is not known and in this case this condition is not always easy to 
verify. Finally, conditions that ensure the regularity of the stationary measure p^ and which rely primarily 
on the existence of a spectral gap were given in |12) . 

Condition 3.2. There is a fixed reference probability measure R G V{Ll) and for each 6 a unique 
stationary probability measure p^ such that p^ R and 9 —> ^^(^) for each fixed x. Where needed, 

we also assume the existence of suitable dominating functions for various functions of p^. 




Following [50], we define the path FIM for stationary Markov processes as the Hessian of the RER, at 
least when it exists: 




1! = 0 




(3.4) 


In the case of a discrete-time Markov Chain, under Condition ED the path FIM reads (for a derivation see 
Appendix A) 


^'h{P^) = J /(dx) J{x,y)[\70 logp\x, y)][Ve \ogp\x, y) f R{dy). (3.5) 

Notice that path FIM, just like RER, e.g., (13.21) . can be computed from the transition probabilities under 
mild ergodic average assumptions, [50]. Eurther examples of continuous-time Markov processes are discussed 
in Appendix A. Einally, using (13.3L (13.41) and Conditions 13.11 and 13.21 we have the expansion 


ip (pfo.T] II Pio^T]) = np^ II (/ II + 0(|u|3), (3.6) 

where I-u{P^) is the path-space Eisher information (13.51) while I{y^) is the Fisher information for the 
stationary measure p®, (12.301) . 

In the non-stationary regime we can use the expansion (I2.29P and Condition 13.11 to obtain for any initial 
measure p of the stochastic process Pj® that is independent of 9 

ip (Pfo,^] II Pfo+") = iu^I(P[^o_^])u + 0(|u|3). (3.7) 

Furthermore, assuming ergodicity of the process and similarly to dSU, we can obtain the path FIM as the 
asymptotic limit, [3, 




(3.8) 


3.2. UQ Information inequalities for path-dependent observables. We consider as an observ¬ 
able a measurable functional P = P{X) of the process {At}o<t<T- For any T > 0 we define the centered 
observable 


.F(A) = P(X)-Epj„_,j[P], 

and using the variational representation dSJj) of the cumulant-generating function, we obtain for any c > 0 

Ap[o .j,,.tjf(c) = log Epp = sup {-P (Q[o.t] II -P[o,t]) + cP(Eqjj, j., [P] - Epj^ [P])} . (3.9) 

Q[0,T]<f’[0,2’] 

Concentrating on the stationary regime, the path-space relative entropy scales linearly with time as shown in 
Lemma [3?^ Moreover, if we consider observables for which Epj^ [P] and Eq^ [P], are uniformly bounded 
for all T, then the second term in the supremum in (13.91) scales also linearly with time, therefore, the right 
hand side of the equation scales at most linearly as T —^ oo and its correct re-scaling for large times is given 

by 

P^Ho,tiP.7=-(c) = sup {Q[ 0 ,T] II -P[ 0 ,T]) + c(EQ[ 0 ,ti I-^I “ lEpo.T] l-^D j ■ (3-10) 

^ Q[0,T] ^f[0,T] L ^ J 

One class of such observables which have a finite expectation as T —>■ oo is the case where P is bounded by 
a constant. Another class of observables of this category which is of great interest in stochastic computing 
is that of ergodic averages: 

P{X) = ^ f{Xs)ds, 


( 3 . 11 ) 








for a bounded observable function /. Under suitable ergodic assumptions we have 
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\im J^{X)=E^[f]= [ fdfi. (3.12) 

T-»-oo J 

Next, we provide a result on path space which is similar to (12.111) . first obtained in [TKH] for measures 

P and Q. We use the notation and the goal-oriented divergence formulation in Theorem 12.71 We show that 
for (suitable) path-space observables, the analogue of relative entropy in (12.111) is now the concept of RER 

(EH). 

Theorem 3.3. (a) (Finite-time regime) Assume that the time-averaged cumulant generating function 
\3.10\) exists in a neighborhood of the origin. Define 

2+(Q[o.t] II -b — 7^ (Q[o,t] II^’[o,t])| , (3.13) 

2-(Q[o.t] ll^’[o,T];.^):=sup|-—App.j,j,Tjp(-c)- —TI{Q[o,t] ||f’[o.T])| ■ (3.14) 

Then we have the bounds 

2-(Q[o,T] II PlO.TfJ^) < EQ[o.t][-^] “ 'ER[o.t][-^] ^ “+(Q[0,T] II PlO.TfP) ■ (3.15) 

In addition, based on Theorem \2. 7| and Theorem \2.8\ we have 

S±(g[ 0 ,T] II Pio.TfP) = ± 4>-l(i7^ (Q[ 0 .r] || P[o.t]) )) , (3.16) 

where 


4>(c) := -Apj„ ,j,j,Tjp(c) -b 

is a strictly increasing function on (0, c), where c = supjc : Apj,, j,],tj='(c) < oo}. 

(b) (Stationary regime) Consider the case of stationary processes and assume the conditions of Lemma \S.2[ 
Then in all formulas of part (a) we can substitute 

^'P{Q[o,T]\\P[o,T])=n{Q\\P) + ^TZ{n\\p) . (3.17) 

Proof. The proof follows immediately from Theorem 12.71 and Theorem 12.81 as well as the bounds (12.111) 
and the relative entropy rate representation of the relative entropy in Lemma [3.21 e.g., (I3.3|l . □ 

3.3. Infinite time UQ bounds. Here we discuss the extension of the previous UQ bounds to the 
stationary asymptotic regime T —^ oo. In the process, we demonstrate the key role in controlling the bounds 
played by the RER as well as connections with the theory of Large Deviations, BE- First we state our 
primary assumptions. 

Condition 3.3. For the centered cumulant-generating function we assume the 

'^^Pio,T],Tr{c) = Ap,j^{c) 

exists and is finite in a neighborhood of the origin c = 0. 

It turns out that this is also the main condition for the Gartner-Ellis Theorem in Large Deviations, [S]. 
In this context, the limiting cumulant generating function App(c) can be calculated explicitly for various 
examples and through the Legendre transform it is associated with the large deviations rate functional, [8| 
Chapter 2.3]. For example, in the case of a discrete-time, finite state space Markov chain given by the 
stochastic matrix P = (p{x,y)) and the time-averaged observable P = have, (see [5J 

Chapter 3.1]), 


Ap,p-(c) = logA(n/(c)). 


( 3 . 18 ) 










^here X{B) denotes the Perron-Frobenius eigenvalue of the matrix B, and n/(c) = (7r/(x, j/;c)) the non¬ 
negative matrix with elements 7r/(x, y; c) = p[x, y) exp (cfijj)). Due to the finiteness of the state space it is 
easy to show in this case that Ap_^(c) is analytic and strictly convex in c, [5]. 

Next we apply Condition 13.31 and the asymptotics (13.11) to the transient regime bounds (13.151) in Theo¬ 
rem [3T31 to obtain the following theorem for the T —oo limit. The second statement follows along the lines 
of (IXTD. 

Theorem 3.4. Assume Condition \3.S\ and define 

E+{Q\\P;B):=mf \-Ap^r{c) + -n{Q\\P)] , (3.19) 

OO C c J 

S_(Q||P;J-):=sup|-iAp,p(-c)--H(g||P)| . (3.20) 

c>0 L ^ ^ J 


Then, we have the bounds 

S_(g||P;P)<limsup(EQj„,,,[P]-Epj„,,,[P]) <5+(g||P;P). (3.21) 

In addition, similarly to Theorem \2.t^ we have 


S±(g \\P-,B) = A'p,p( ± ci>-i(p(g IIP))) 


(3.22) 


where 


$(c) := -Kp^jr{c) + cA'pp{c) 

is a strictly increasing function on (0,c), where c = supjc : Ap_p(c) < oo}. 

3.4. Linearization of the UQ bounds. The bounds in Theorems 13.31 and 13.41 can become (asymp¬ 
totically) more explicit in the case where the relative entropy or the RER 'H(g || P) is small, that is by 
expanding 5±(g[o,T] II in (13.151) . Furthermore, the RER can be explicitly calculated in several ex¬ 

amples discussed earlier in Section[3]and in Appendix A. More specifically we have the following asymptotics. 

Lemma 3.5. Assume that the cumulant generating function Ap^^ j,j,tj^(c) exists in a neighborhood of the 
origin. Assume also that 


— TZ (gp.T] II P[0,T]) — 

for two path probability measures P[o,t]) g[o,T]- Note that by in the stationary case this is essentially 

an assumption on the relative entropy rate Tl{Q || P). Then, there exists a function Cp{p) which is the unique 
minimizer (resp. maximizer) of i3.13\) (resp. Furthermore, there is po > 0 such that Cp{p) is C°° 

in (0, Po) and admits the perturbation expansion 


cUp) = Cp.iP + 0{p^) , where Cp ^ = J -- . (3.23) 

Proof. The proof follows the same steps as the proof of Lemma 12.101 □ 

Substituting the expansion in p for the optimal value (13.231) into the expansion of 5±(g[Q_p] 11 P[o,T ]; P) 
we obtain asymptotics of the upper and lower bounds for the weak error in p^ = (Q[o,t] II P[o,t])- 

Theorem 3.6 (Linearization). Under the assumptions of Lemma \3.5\ we have: 

(a) (Finite-time regime) 

|Eq,„,,, [P] - Epj„,,, [P]| < yivarp,„,,,(TP)y|p (g[o.T] II Pio,T]) + (g[o.T] II P[o.t]) ) , (3.24) 




















(b) (Stationary regime) In this case iS.5\) implies 
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|EQ[o,., [^\ - ^P,o,n [-^ll ^ \/^Varpj„_,,(rj-)y/2 {n{Q || p) + i7^ (p || + o(^n (Q[o,t] II P[^,t]) ) • 

(3.25) 

As in the static case, the term o(^^TZ (^Q^q x] II^[o.t])) can be further quantified using the asymptotic 
expansions of c*{p) and (13.161) in p. 

Remark 3.1. For observables which are time averages, e.g., P{X) = y fi^k), and for a station¬ 

ary Markov process P[o,t] the variance terms in Theorem 13.61 take the form of an autocorrelation function 
and can be uniformly bounded in T. Specifically, 

ivarp,„,,,(rP) = Var^(/) + 2^ (l - Af{k) := Mf ), (3.26) 

k=l ^ ^ 

where Af{t) := Ep^ [(/(Xq) —E^[/(Jfo)])(/(X() — E^[/(Xo)])] is the stationary covariance function. Recall 
that when t(/) := limp_>.oo tt(/) < oo then r(/) is known as the the integrated autocorrelation function, 
[24] . The proof of (I3.26|) is carried out in Lemma 13.81 below, see (I3.34|) . 

3.5. Sensitivity bounds in path space. As in Section we will consider the sensitivity analysis 
problem, but this time in both the context of transient and stationary dynamics. First we consider the 
path-space probability measure Pj® where 0 G is a vector of the model parameters. We consider a 
perturbation v € in the parameter vector 0 and the resulting path-space probability measure P^q'^x] ’ 
focus first on the discrete time model. The continuous time calculations are carried out in a similar manner, 
and we refer to Appendix A for the related formulas. 

The next theorem readily follows from (13.241) and (I3.25|) and the asymptotics in e in (13.61) and (ITtI) . 

Theorem 3.7. Assume the conditions of Lem,ma \S.5\ 

(a) (Stationary regime) Furthermore, assume the conditions of Lemma \3.‘A and Conditions \S.l\ and \3.‘A For 
any v G and e Z 0, we have 


1 [J-] - Ep.[J-]I < yivarp« ^^(rj-)^n^ (^I«(P«) + ll(/r«)) n + 0{e), (3.27) 


and 


\S^AP[o,t])\ < \l(^ h{P^) + W (3.28) 

where the sensitivity index 5'p^„(Pjp .j,j) is defined in i2.31\) . 

(b) (Finite-time regime) If the process R[g pj is not stationary then we only need to assume Condition \3.1\ 
Then we have the same bounds as in (a), however the term ^v"'" (ixiiP^) + is replaced by the term 

rp-^)v/T; we also note the uniform bound in the time horizon T of the latter term due to i3.8\} . 

We remark that in the stationary regime and for time-averaged observables such as (13.111) , it holds that 

SrAPkT]) = Sf,vA^) (3.29) 

where A is the stationary distribution, due to the regularity assumed in Condition 13.21 

Remark 3.2. Bounds such as ()3.28l) relate any stochastic gradient-type sensitivity analysis methods 
such as likelihood ratio Girsanov [32] and path-wise methods [36] that develop efficient estimators for the 
sensitivity indices (12.311) . with information theory based methods, showing that the latter provide a sensitivity 
bound on (|2.3ip . Similarly the bound (13.271) relates sensitivity methods relying on finite-differencing [34l[Tl[^ 
































with information-theory sensitivity analysis methods, [SD]. We refer to the inequalities (13.271) and (13.281) as 
sensitivity bounds. These bounds can be computed efficiently and can provide fast screening of insensitive 
observables, as well as parameters or directions in the parameter space. We refer to [1] for more details, 
implementations and examples. 

Next we focus on the infinite-time asymptotic regime and the related sensitivity bounds. Taking the 
limit T —> oo we obtain bounds for time-averaged observables. First, we recall a result on the asymptotics 
of such observables, [37], and provide a proof for completeness in our presentation. 

Lemma 3.8. Under the assumptions of Theorem and for observables of the form 

1 

J^TiX) = -Y,fiX^), (3.30) 

the following conclusions hold. If the process is stationary and the series defined below 

converges absolutely, then the limit 

^lim ^Varp« {TPt{X)) = t(/) (3.31) 

exists, where T{f) is the integrated autocorrelation function (lAT), defined as 

OO 

r(/) := hm tt(/) = Var^(/) -b 2^kl/(fc), (3.32) 


and 


Af{k) := ¥,pe^ J{f{X,) - E^.[/(Xo)])(/(X,) - E^.[/(Wo)])] 

is the stationary covariance function of the process X. 

Proof. A direct computation of the time-averaged variance gives 




/T-1 


J 2 fix^)-^po 


.2=0 


[0,T] 


'T-1 


E 


2 = 0 


T-1T-1 


= ^ E E [fixmfiXj) - Ep.^^^ [fix 

i—0 0 

T-1T-1 

= tE 


2=0 j = 0 


(3.33) 


where Covj(*,j) is the covariance between f{Xi) and fiXj). Due to the stationarity, we have that under 
P[Q y] each Xi is distributed according to /i®, hence Covf{i,j) = Epe [(/(W) — ^^J,^[f{Xo)]){f{Xj) — 
E;pe[/(>Ao)])] = Covf{i- j,0) =Af{i-j). Therefore, 


Varp. ^^(TPp(X)) = ^ E (T’- l^l)Cov/(fc, 0) = U - y) (3-34) 

lc=-T^^ k=-T ^ ' 


fc=-T+l 

Sending T —>■ cx), we obtain from dominated convergence that 


1 


(3.35) 


k— — c 


□ 










For stationary processes in continuous time, the formula for the lAT is given by 
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'r(/) = / Af{t)dt 


(3.36) 


where, as in the discrete time case, Af{t) = Epe [{f{Xt) — E^e [f{x)]){f{Xo) — E^e [/(a:)])] is the stationary 

covariance between f{Xt) and /{Xq). 

Now the following theorem readily follows from (I3.28|) . 

Theorem 3.9 (Infinite-time). Under Conditions \3.1\ and 1 3. A and the assumptions of the previous 
lemma, the following hold. For any v G 




(3.37) 


where we used the fact that Epe ^ [J^] = E^e[/] for any stationary process and therefore 5^_„(Pjppj) = 
^^6 sensitivity indices defined by h2.31\) . 

3.6. Some practical implications for sensitivity bounds. Given the results in Section 13.51 as 
well as the computational feasibility of RER and path FIM demonstrated in [301 HI] j briefly investigate 
sensitivity bounds for more general functionals than the time averages (13.lip and less stringent conditions 
than those in Theorem 13.91 First, based on Theorem 13. 7l bl it follows that we can consider any path-space 
observables JF such that 


—Varpe (TP) = 2^Varp8 (j^) < < oo uniformly in T , 


(3.38) 


for some constant C. Next, using (13.81) we obtain from Theorem 13.7l bl and (13.381) the limiting sensitivity 
bound 


limsup|5'jP,.„(P[o_P])| < C^Jv'^Xh{P^)v . 


T —voo 


(3.39) 


Note that in contrast to Theorem 13.91 here we only need to assume the easily verifiable Condition 13.11 
which depends solely on the regularity in 6 of the local dynamics. Although the existence of the sensitivity 
index at T = oo is not guaranteed due to the absence of (the hard to verify) Condition 13.21 or related 
conditions in [12], the sensitivity indices Sj^^v{P[q ^-j) remain controlled uniformly in time due to ()3.39p . The 
boundedness of the variance associated with the observable T in ()3.381) can be monitored in the course of 
an actual simulation, while the path FIM in (13.391) is an easy to sample observable, as demonstrated in [30] . 
Furthermore, the path FIM can for certain classes of stochastic dynamics scale linearly with the number 
of model parameters, making it computationally tractable even for systems with a very large number of 
parameters. For instance, see m for the case of complex biochemical reaction networks where the graph 
structure and the type of reaction rates induce a block diagonal structure on the path FIM; we also refer to 
Figure 1 in [31] for a demonstration. 

In a second direction geared also towards practical implementation, we compare (13.371) and (13.391) to the 
earlier static bound (12.361) . Indeed, even though the form of the sensitivity bounds (13.371) and (13.391) are 
similar to (12.361) , there are some substantial differences and advantages in considering the path-space bounds 
of this section. More specifically, when we want to study the sensitivity of ergodic averages such as (13.30L 
we can either use the path space estimate in (13.371) or alternatively the equilibrium bound (12.36L i.e., 


l^/.4/)l < ^Var^«(/)^n^I(/)«. 


(3.40) 


On one hand, (13.401) involves the FIM of the equilibrium measures /r®, which we do not typically have 
available in most non-equilibrium systems such as biochemical networks, reaction-diffusion mechanisms or 
driven systems. However, the path-wise estimate (13.371) can in principle always be computed since it involves 
only the local dynamics p^ in the path FIM (|3.51) , see for instance [30] and [31] . 
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3.7. Cramer-Rao inequalities for time-series. The sensitivity bounds (13.281) and (13.371) can be 
considered as extension of the Cramer-Rao inequality for the time-series of Markov processes. Indeed, we 
recall that for a parametric family of probability measures P®, where for simplicity in the presentation we 
assume that 9 is scalar, the Cramer-Rao inequality provides a lower bound for the variance of any unbounded 
statistical estimator. Specifically, assume a biased estimator 9 = f{X) of the parameter 9 with bias function 
'!/;(0), i.e., Ep» [/] = V^(6(). Then the Cramer-Rao bound for a scalar parameter 9 states that, [T5] . 


Va.rpe{9) > 


X{P0) 


(3.41) 


Upon rearranging, this bound is precisely the sensitivity bound (12.361) . where the expected value of the 
observable / is the biased estimator of an unknown deterministic parameter in a family of probability 
measures. Furthermore, it is also known, [HIE], that such bounds are sharp in the sense that for specific 
estimators (observables) such as the Maximum Likelihood Estimator, the bound (13.411) . (12.361) becomes an 
equality. 

In the same sense, we can obtain a new Cramer-Rao type inequality for time series stationary statistics 
based on our UQ information bounds in path-space. Indeed, path-space observables such as Tt(X) = 
T Si fi^i) pls-y the role of the statistical estimator for 9, (i.e., 9 = Tt{X)) and the sensitivity bound (13.371) 
constitutes a Cramer-Rao lower bound for the I AT (13.321) . 


Tpeif) > 


Xu{P^) ’ 


(3.42) 


where V'(^*) = Ep" [^t] is the bias of the estimator. Therefore, for dependent samples created for instance 
by Monte Carlo Markov Chain methods [24], the lower bound (13.421) can be utilized. Finally, we remark 
that estimators with dependent samples have generally larger variance than estimators using independent 
samples, however, for the same amount of computational time, larger number of dependent samples than 
independent samples are drawn. Hence it is not clear which estimator has better performance in terms of 
reduced variance for a given computational cost. In this direction, the Cramer-Rao bound (13.421) may be 
very useful. 

4. Demonstration Examples. This section demonstrates the application of the derived bounds for 
several stochastic models. The sensitivity bound derived in Section 12.41 is utilized in the first two examples 
where the sharpness of the bound is discussed. In the third and fourth examples, both stationary and path 
space sensitivity bounds are computed and compared for various observable functions. In these examples, 
the stationary bounds can be slightly sharper than the bounds that utilize the path FIM, however, stationary 
bounds are rarely explicitly available. Indeed, the birth/death process presented in Section l473l is a special 
case of a single-species biochemical reaction network with an explicit stationary distribution, however, for 
reaction networks with more species the stationary distribution is generally unavailable. Similarly, the 
stationary distribution in Section 14.41 where a stochastic differential equation (SDE) example is considered, 
is not generally known; for instance, in SDE with additive noise where the drift term is not of conservative 
type, i.e., the gradient of an appropriate function. For such stochastic models, the only available option for 
a tractable sensitivity bound is the path-space sensitivity bound (|3.37l) . 

4.1. Exponential family of distributions. A probability density function belongs to the exponential 
family if it admits the following canonical decomposition [28] 

P^{x) = exp [t{x)^9 — F{9) + fc(x)} 

where t{x) = [ti{x), ...,tK{x)\^ is the sufficient statistics vector, 9 € is the parameter vector, F{-) is 
the log-normalizer (free energy in statistical physics) and k{x) is the carrier function (associated with the 
prior probability measure in statistical physics). The statistics t{x) are called “sufficient” because it contains 
all the information needed for the estimation of the parameters. Considering the sufficient statistics as 
observables, the corresponding sensitivity indices can be analytically calculated as 

























The covariance matrix of the sufficient statistics vector equals the Hessian of the log-normalizer, F, (i.^.^ 
Covpe (t(x)) = \7^F{9)), while the relative entropy of P® w.r.t. P®+'^ can be written as the Bregman 
divergence of the log-normalizer on swapped natural parameters [28] given by 

TZ (P® II P®+'^) = F{0 + e)- F{9) - e'^VF{9) . 

A straightforward Taylor series expansion of P in e implies that the Fisher information matrix, P(P®), defined 
in (12.301) equals the Hessian of the log-normalizer, too. Therefore, for sufficient statistics of the exponential 
family distribution. Theorem 12.131 states that 






(4.1) 


Notice that the inequality becomes an equality when k = 1. From a parameter estimation perspective, the 
equality of the bound of the fc-th sufficient statistic with respect to the fc-th parameter is equivalent to the 
fact that tfc(x) is an efficient estimator oi 9k, k = 1,K. In other words, the Cramer-Rao bound (I3.41|) is 
attained, [20l Thm 5.12]. Finally, another bound for the sensitivity indices can be obtained directly from the 
properties of the Hessian of P: the log-normalizer, P, is a strictly convex function [^, hence, its Hessian is 
positive semi-definite which results in the bound, \Stf,, 9 iiP^)\ < \{'§^P{(^) + Fs^Pi^))- However, this latter 
bound is less tight than the information-based bound 63) since the geometric mean is always less or equal 
to the arithmetic mean. 

4.2. Stochastic differential equation example. We consider the differential equation 


ii = —Xu , m( 0) = Mo 

where X is a Gaussian random variable with mean fi and variance cr^. This stochastic model has been 
previously utilized for the assessment of uncertainty quantification bounds in [21) . The stochastic solution 
of the equation is 

u{t) = uoe~^* 


whose distribution law is log-normal with parameters \og{uo) — fit and (crt)^. The probability density function 
is given at time instant t by 


P^(m) = 


7t\/^ 


exp< - 


(log(M) - log(uo) -f fltf- 


2{aty 


where 9 = [fi, cr]^, and with the dependence of the density on time t as well as on the initial data mq is hidden 
for the sake of notational simplicity. The goal is to compute the observable that quantifies the probability 
of u{t) being larger that a determined value, u, at time instant t. This is a failure probability and can be 
written as an ensemble average. 


Pf : 


(4.2) 


where the observable function is the characteristic function (i.e., f(u) = X{u>u}('^))- Notice that even 
though log-normal distribution belongs to the exponential family, here we are not interested in the natural 
parameters or the sufficient statistics, but we rather focus on the observable (14.21) . Therefore, the general 
setting of the previous subsection does not apply. Nevertheless, calculations are still straightforward, and 
the sensitivity index for fi is given by 


^X{u>a},uiP ) — Epe X{ii>u}(m) 


log{u) - \og{uo) + fit 


aH 


while the sensitivity index for the standard deviation a is 


'^X{u>o} ) Epe X{u>ii}(m) 


(log(M) - log(Mo) -k fitf - 

























¥iie variance of the observable is 


log(M) - log(Mo) + ^jLt 
\/2at 


Varp8(x{u>s}) —'^po[X{u>u}] ~ [X{u>«}])^ 



where erf is the error function while the diagonal elements of the FIM for the log-normal distribution, P®, 
are given by 




]Ep0 


(log(u) - log(uo) -b nt) 


2 


{any 


and 

((log(u) - log(Mo) + 

(a3f2)2 . • 

Figures |FT] and |4^ show the absolute value of the sensitivity indices and the corresponding sensitivity 
bounds as a function of time for tj = 1 and (7 = 2, respectively. The remaining parameters were set to 
mq = Ij h = 10 and ^ = 1 while the computations of the expectations were performed numerically, whenever 
necessary. In Figure HlTl the sensitivity bound of Theorem 12.131 follows closely the sensitivity index in the 
course of time. The sensitivity bound in Figure performs accurately for the sensitivity index of the mean 
(upper panel), however, it is less sharp around time t = 5 for the standard deviation (lower panel) due to 
the existence of a zero transition of the sensitivity index at that particular instant. Interestingly, the lower 
panel of Figure |42] reveals that both upper and lower bounds for small time and larger times, respectively, 
provide information about the corresponding sensitivity index. Taking into account the complexity of the 
chosen observable which can be a risk-sensitive (i.e., rare event) observable when t is large, we would like to 
emphasize that even in this difficult case there exists always a guaranteed bound for the sensitivity indices 
and in that sense one cannot but benefit from its use. 
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Fig. 4.1. Upper panel: Sensitivity index for the mean value (red) and the sensitivity bound from Theorem \2.1S\ (blue). 
Lower panel: Sensitivity index for the standard deviation (red) and the respective sensitivity bound (blue). In both panels a = 1. 


4.3, Birth/death process. We consider a well-mixed reaction network which consists of one species 
and two reactions given by 
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Fig. 4.2. Upper panel: Sensitivity index for the mean value (red) and the sensitivity bound from Theorem \2.13\ (blue). 
Lower panel: Sensitivity index for the standard deviation (red) and the respective sensitivity bound (blue). In both panels a = 2. 


The corresponding propensity functions for the current state x = x are 

ai(a;) = ki and 02 ( 41 ) = k 2 X . 

Mathematically, this stochastic system is modeled as a continuous-time Markov chain (CTMC) and due to 
its simplicity there exist analytic representations of the steady state (equilibrium) distribution, moments 
and autocorrelation function, [101 Sec. 7.1]. The steady state distribution, ft^, of the reaction network is 
Poisson with parameter Hence, the steady state moments as well as the FIM for the parameter vector 
6 = [^ 1 ,^ 2 ]^ are known. The elements of the stationary FIM (eq. (I2.30p ') are shown in Table ITTTl In 
the same Table, the elements of the path FIM are shown (an pp. 10 ]. Notice that the stationary FIM is 
singular while the path FIM is full rank implying that when the complete time-series is provided then both 
parameters can be inferred. If samples were i.i.d. drawn from the steady state distribution, then only the 
parameter ratio is inferable. Next, we consider two observables, the mean, fi{x) = x, and, the variance, 

Table 4.1 

Stationary and path-wise FIM’s elements. 


Matrix element 

Stationary FIM, 

Path FIM, iHiP*") 

(1,1) 

1 

k^ k'?. 

J 

All 

(1,2) 

1 

If 

0 

(2,2) 

ki 

fef 

All 


f 2 {x) = {x-^f. Since E^e[fi] = 1 / 2 ] = the sensitivity indices are %,*,,(/) = and 

Moreover, in order to compute the lAT for /i and / 2 , the computation of 
the autocorrelation and the autocorrelation of the variance are necessary. Due to the linear nature of this 
example, explicit formulae exist and they are reported in Table 14.21 The corresponding lATs are also 
shown in Table 14.21 for both observable functions. In Table 14.31 both stationary and path-wise sensitivity 
bounds are compared to the actual sensitivity indices. The Poisson distribution belongs to the exponential 
family, hence we have a sharp bound for the mean value and the stationary case while the bound for the 
path-wise case is worse by a factor. When the variance is considered as observable, the stationary bound 
is also slightly tighter than the path-wise bound. In the later case, the path-wise bound becomes equivalent 
to the stationary bound when fc 2 <C fci, while both bounds become sharper when fci <C /c 2 . Finally, note that 
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Table 4.2 

Variance, autocorrelation function and lAT for the observables fi{x) and f 2 {x) of the birth/death process. 


Observable 

Variance 

ACF 

lAT 

fi (x) = X 

k2 

—^2 |i| 

k2^ 

kfy 

CM 

1 

II 

fel 1 0 

k2 fcf" 

^g-^ 2 |i| _|_ 2 ^-2k2\t\ 

k2^ ' fcf 

/i, Al, 


even though we have comparable performance for the stationary and path-wise bounds, there is a crucial 
advantage of the path-wise analysis which is its computational tractability. Indeed, in complex reaction 
networks, the steady state distribution is rarely known, hence, the stationary FIM cannot be derived. On 
the other hand, explicit formulas for the path-wise FIM exists m and the corresponding sensitivity bound 
is computable through Monte Carlo sampling. 


Table 4.3 

Sensitivity indices and the corresponding sensitivity bounds for the mean value and the variance of the birth/death process. 


SI 

Value 

SB (Thml2.13D 

SB (Thm 13.91) 

Sx,ki 

J 

ko 

J 

k2 


Sx,k 2 (M®) 

_ Kl 

kl 

If 

v^If 


J_ 

k2 




_ ki 




4.4. Ornstein-Uhlenbeck process. Consider a one-dimensional Ornstein-Uhlenbeck (OU) process 
defined by the stochastic differential equation 

dXt = -a{Xt - P)dt + ^dBt 

where 9 = [a, /3, 7 ]^ are the system’s parameters while Bt is a one-dimensional Brownian motion. The 
stationary distribution of the OU process, /r®, is Gaussian with mean /3 and variance The diagonal 
elements of the stationary FIM are presented in Table H751 ( 2nd column). Taking f{x) = a; as an observable. 
Table 14.41 reports the variance with respect to the stationary measure, the autocorrelation function as well 
as the lAT for the continuous-time process. 

There are two approaches for the computation of the path-wise FIM. The first is to compute RER directly 
from the Girsanov formula and then the FIM is obtained from a linearization procedure. The formula for 
RER is given in (IA.9I) . thus, it is straightforward to calculate the path-wise FIM whose diagonal elements 
are shown in Table 14.51 Notice that if the diffusion parameter, 7 , is perturbed by a small amount then 
the RER is infinite. Indeed, by Girsanov’s Theorem the path-space measures of two SDE processes are not 
absolutely continuous with each other when the diffusion terms are different, [131 [29]. Therefore, the path- 
wise sensitivity bound in continuous-time is applicable only for the parameters of the drift. Clearly, in the 
OU case a simple rescaling can remove the parameter from the noise term and bypass altogether this issue. 
The second approach is to discretize the stochastic process, defining a new discrete-time Markov chain and 
then compute the path-wise FIM from the FIM of the DTMC renormalized with the time-step, [30]. Even 
though the second approach is an approximation, it is more flexible since it provides a sensitivity bound even 
when the diffusion parameters are considered. Overall, the time-discretization results in a regularization of 
the new path-space measures, hence, a finite RER is obtained even if the parameters of the diffusion part 
are perturbed. Following the second approach, we consider the Euler scheme for the OU process which is a 
first-order weak error integrator [18] given at the n-th step by 

Xn+l = Xn + Cx{Xn — P)Xt + yV AtAWn, 

where At is the discretization step while XWn are i.i.d. zero-mean Gaussians with unit variance. Hence, 
the transition probability, p^{x,y), is Gaussian with mean x + a{x — 13)At and variance 7 ^At. The last 
























Table 4.4 

Variance, autocorrelation function and lAT for the mean value as an observable of the OU process, 
and discrete-time (Euler distretization) are considered. 
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Both continuous-time 


Observable 

Variance 

ACF (cont. time) 

lAT (cont. time) 

ACF (Euler) 

lAT (Euler) 

fix) = X 

_ 2 a_ 

7lg-a|t| 

__ 

__ 


__ 


two columns of Table 14.41 show the autocorrelation function as well as the lAT for the discrete-time process 
obtained after discretization using the Euler scheme while the last column of Table 14.51 shows the diagonal 
elements of the path-wise FIM again for the same discrete-time process. In order to compute these quan¬ 
tities, averaging with respect to the (unknown) stationary distribution of the Euler scheme, / 2 ®, which is 
an approximation of the stationary distribution of the continuous-time process, is required. However, 
we averaged with respect to instead of ft^ exploiting the fact that the produced weak error is of order 
0{At), [22. Another remark on the path-wise FIM is that when the limit At —>■ 0 is taken and the diffusion 
parameter, 7 , is perturbed then the corresponding FIM value is infinite which is in accordance with the 
Girsanov Theorem restrictions mentioned earlier . 


Table 4.5 

Diagonal elements of the stationary and path-wise FIMs for the Ornstein-Uhlenbeck process. Path-wise FIM for both 
continuous-time and discrete-time approximation (Euler scheme) are considered. 


Matrix element 

Stationary FIM 

Path FIM (cont. time) 

Path FIM (Euler) 

( 1 , 1 ) 

i 

2a2 

1 

2a 

1 

2q 

( 2 , 2 ) 

2a 

J 

7 = 

Hi 

7^ 

(3,3) 

I 2 

_S_ 

CX) 

‘2 

~PAt 


Table 14.61 presents the sensitivity indices and the various sensitivity bounds for the mean value as an 
observable. The stationary bound for (3 is sharp as expected due to the fact that Gaussian belongs to 
the exponential family and the mean value is a sufficient statistic. The continuous-time path-wise bound 
as well as the discrete-time path-wise bound (up to order 0(Af)) for /3 are sharp. For a, the stationary 
bound is smaller by a factor of a/S while for 7 the factor of the discrete-time path-wise bound make 
the stationary bound better. Finally, notice that as in the birth/death process the stationary bounds are 
slightly tighter. However, for general SDEs where the drift term is not necessarily of conservative type, 
the stationary distribution is rarely known hence the computation of stationary FIM and consequently the 
stationary bounds are intractable. For instance, a large class of stochastic processes where the stationary 
distribution is not known consists of the non-equilibrium systems where the drift is a non-conservative force 
while the noise is additive, [33] , [22 ■ Therefore, the respective stationary sensitivity bound is intractable for 
this important category of stochastic processes, while the path-wise bound (I3.37|) is computable. 

Table 4.6 

Sensitivity indices and the corresponding sensitivity bounds for the mean value of the OU process. 


SI 

Value 

SB (Thml2.13l) 

SB iThm [791 cont. time) 

SB (Thmimi Euler) 

Sx,a{p^) 

0 

7 

2ay/a 

\/2 ^ 

\/2 


1 

1 

I 

I 
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__ 

CX) 

Vi 

ay/ At 


5. Conclusions. In this paper, we derived information inequalities that bound weak error estimates 
and sensitivity indices. We further extend the variational UQ bounds which were previously derived in 
[aim in several directions. First, we observe and prove that the UQ bound defines a novel goal-oriented 
divergence which couples observables of interest (hence the term “goal-oriented”) with the relative entropy 
of the “true” probabilistic model with respect to a computationally tractable “nominal” model. Second, an 










































implicit representation for the goal-oriented divergence was derived which after linearization resulted in a 
sensitivity bound which decouples the role of the observable function from the distance of the probability 
measures as quantified by the FIM. Exploiting the properties of the relative entropy in path-space, we further 
extend the UQ and sensitivity bounds to the case of stochastic dynamics for both transient and long-time 
regimes. The relative entropy rate which is the relative entropy per unit time and the corresponding path 
FIM are the quantities that control the weak error and the sensitivity indices, respectively, at infinite times. 
An advantage of the path-space sensitivity bounds is that they depend only on the local dynamics of the 
process thus they are computable from a direct Monte Carlo simulation. This feature is very attractive in 
out-of-equilibrium or non-equilibrium systems, where the stationary distribution is not relevant or known. 
Finally, this paper is primarily a theoretical work and extensive numerical examples, algorithms, synergies 
with other methods and applications to high-dimensional realistic systems will follow. 

Appendix A. Relative entropy rate and path Fisher information matrix: Examples. The 

relative entropy rate (RER) and the path Fisher Information Matrix (pFIM) can often be expressed explicitly 
in terms of the local dynamics, which we demonstrate in a few examples for Markov processes, including 
discrete and continuous time Markov chains and stochastic differential equations. 

A.l. Discrete-time Markov chains. RER always has an explicit expression for discrete time pro¬ 
cesses with values in the Polish space X. We first state a version of the chain rule. For a proof see [SJ 
Theorem C.3.1]. 

Lemma A.l. Let a and j3 be probability measures on X x y, where X and y are Polish spaces. Let ai 
and Pi denote their first marginals, and denote by a{dy\x) and P(dy\x) the conditional distribution on the 
second variable given the first. Then the mapping x —> 'R-{a{-\x) || /3(-|a;)) is measurable, and 

7^(a II/3) = 7^(al II ^i)-b [ n{a{-\x)\\P{-\x)) ai{dx). 

J X 

Lemma A.2. Let {AtjtgNo; Markov processes on the state space X with transition kernels 

q{x,dx') and p{x,dx'), and initial measures i/(dx) and fi{dx), respectively. Assume that v is stationary for 
q{x,dx'). Then the relative entropy rate Tl{Q || P) defined in is given by 

'^{Q\\P)= j ^{dx) J q{x,dy) log fa) ■ (A-1) 

Furthermore, the relative entropy rate is expressed as the relative entropy 

TLiQ\\P) = TZ{i'0 q\\ fUSi p) , (A-2) 

where v ® q is the probability measure on X^ given by [v 0 q]{A x B) = q(x, B)iy(dx). 

Proof. Both statements follow directly from the chain rule. Lemma lA.ll Since ix is stationary for 
q{x,dx'), we can apply the chain rule from time t = T — 1 back to f = 0, and by using Markov property 
obtain (13.3L with 'H(Q II P) equal to 


lx 


P-iqix,-) II {p{x,-))ix{dx). 


However this is precisely (lA.lll . and thus the first claim follows. (IA.2I) also follows directly from the chain 
rule and the fact that TZ{ix\\ix) =0. Finally, notice that even though a quantity between path distributions, 
we drop the dependence of time interval in the notation of the relative entropy rate because relative entropy 
rate is a time-independent quantity. □ 

Lemma A.3. Assume Condition \3. 1\ Then, the path FIM defined in i3.4\) is given by 


0 [ {x,y)\7g logp^{x,y)Ve logp^{x, y)'^ R{dy) 

J E 




(A.3) 











Proof. Define the function G{0) = G{9\x,y) = \ogp^{x,y) for all x,y € X. Then, from Condition E? 
G{9) as a function of 6 is G^ and for an arbitrary e S 


G{e + e) = G{e) + e^VsGiO) + -GVgG{9)t + i?2(0) 


/vy 

fV9p\'^\ 

V p^ 

\ pO ) ) 


where V and denotes the gradient and the Hessian of a function while i? 2 (^) is the remainder term as 
given by Taylor’s Theorem. Then, the relative entropy rate of the path distribution Pjp with respect to 

the perturbed path distribution becomes 

'H(P® ||P®+'^) = y /(dx) Jp%x, y) log R{dy) 

= - J p\dx) J p\x,y){G{e + €-,x,y)-G{0-,x,y))R{dy) 

= -jp{dx)jp{x,y)[e J J ^ 

/(dx) [p^{x,y)R2{0;x,y)R{dy) 


P%dx) j dl{dy)€ + J /(dx) j j 


p‘>{x,y) 

since for any i = 1, 2,... it holds that 

/ y) = '^^y p\x,y)R{dy) = V^l = 0 

where Vg denotes the i-th derivative operator. Thus, the path FIM is given by 


I«(P^)=E„. 


p®(x, y)Ve logp®(x, y)Ve logp®(x, y)'^ R{dy) 


□ 

Remark A.l. Performing similar Taylor series expansion, it can he obtained that the relative entropy 
rate of P^^^ w.r.t. P® admits the same Hessian. Indeed, it is expanded as 

P(P®+^ II P®) = ie^I„(P«)e + 0(|e|3). 

Notice also that this result is valid not only for discrete-time Markov chains but it is quite general. 

A. 2. Continuous-time Markov chains. Next, we compute the relative entropy rate for continuous¬ 
time Markov chains. We consider such chains on a countable state space X and let quantities such as P[o,t] 
denote the measure on P([0, T] : X) induced by the process, where P([0, T] : X) consists of all A : [0, P] —A 
that are continuous from the right and with limits from the left, with the usual Skorohod topology. 

Lemma A.4. Let {Xt}t>o and {Tt}t>o be stationary continuous time Markov chains with the countable 
state space X and jump rates A(x) and A(x) and transition probabilities p{x,x') and p(x,x'). Assume that 
A and A are positive and uniformly bounded above. Assume also that p{x, x) = p(x, x) = 0 for all x € X, 
and for x' x that p(x, x') > 0 if any only if p{x, x') > 0. Let f be a stationary probability distribution for 
{Xt}t>o, and let p be any initial distribution for {l 4 }t>o- Let Q[o,t] o-'^d P[o,t] be the measures induced by 
{Xt}t>o and {Vt}t>o- Then the relative entropy rate 'H((5||P) associated with P((5[o,t] II-P[o.t]) is given by 


n{Q\\p) = Y.Y. /i(x)A(x)p(x, x') log 


\(x)p{x, x') 
A(x)p(x, x') 


^ m(x)(A(x) - A(x)). 

x^X 


(A.4) 














Proof. According to nzi Prop. 2.6, App. 1] and [121 Sec. 19] the Radon-Nikodym derivative of the path 
measure Qio,t] with respect to the path measure P[o,t] is given by 


dP[0.T] 


KXq) 


exp ■ 


log- 


,Xt) 


MXt)p{Xt_ , Xf) 


dNtiX) 


iX{Xt)-XiXt))dt 


where TVs (A) is the number of jumps on the path X up to time s. The relative entropy up to time T is 
defined by 


T^{Q[o,t] II-P[o,t]) 


T] 


log 


dQ[0,T] 


dPi 


[O.T] 


Since A is bounded Mt = Nt — Jq X{Xt) dt is a mean zero martingale, then for any (non-negative and 
measurable) function / on A 


■ fT 



/ f{Xt)dNt 

®‘Q[o,t] 

/ f{Xt)X{Xt)dt 

Jo 

Jo 


Furthermore, from stationarity we have [JJ' f{Xt)X{Xt) dt] = P'i^) f i^)X{x)■ Substituting for 

/ the expression for the logarithm of the Radon-Nikodym derivative we obtain 


T^{Q[o,t] II-P[o.t]) = t(J 2Y1 ^(^)CX{x) - A(x)) ) +TZ{fL\\p) . 

XxeXx'GX A(X)P[X,X ) j 

□ 

Remark A.2. We can rearrange the expression for the RER to obtain 

where i{z) = z log z — z + \ for z > 0. This exhibits the RER as a form of relative entropy. The function 
£(z), which appears in rate functions for the large deviation theory of jump Markov processes |2], is non¬ 
negative and vanishes only at ^ = 1. Thus the RER is non-negative, and equals zero if and only if the two 
chains are the same. 

Lemma A.5. Let the transition rate defined for all x,x' G X by c^{x, x') = X^{x)p^ {x, x') he parametrized 
by 9 G and ssume that the mapping 6 —>■ c®(-, •) is . Let Pj® .pj (resp. ) be the path (resp. stationary) 
measure of the assoeiated process. Then, the path FIM is 


I«(P®)=Ep. 


c^{x, x')X7e \ogc^{x, x')Xg logc®(a;, x')'^ 


(A.5) 


Proof. The proof is similar to the DTMC case using now two auxiliary functions defined by Gi{9) = 
Gi{6; X, x') = \ogc^{x, x') and G^iO) = G2(9; x, x') = c^{x, x') for all x, x' G A. For completeness, we present 
the basic steps of the relative entropy expansion. The relative entropy rate of the path measure Pj® pj with 
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respect to the perturbed path measure written as 


/(x)c^(x,a;0log - Y /(a;)(c®(x,x') - c®+'^(x,x')) 

= — Y^ ^^{x)c^{x,x'){Gi{0 + e) — Gi{0)) + Y^ 0 ‘^{x)iG 2 {d + e) — G 2 { 9 )) 

x,x'^X x^x'^X 


+ Y^ //®(x)(e^V0C®(x,x') + ^e^VgC®(x,x')e + ^2(6 ';x,x')) 

x,x' ^X 

= X! /a:') ^ ~ X! f^^{x){c^{x,x')R2{9;x,x')-R2{9 ;x,x')) 


x,x'^X 


x,x'^X 


where R 2 (0) and R 2 (0) are the remainder terms of Gi and G 2 , respectively. □ 

A.3. Stochastic differential equations. We also compute the relative entropy rate for Ito diffusion 
processes. To avoid technical difficulties we impose following assumptions: we assume that the vector fields 
a(x), b{x) G X G R'^ and the non-singular (j{x) G are such that the fto’s stochastic differential 

equations 


dXt = a{Xt)dt + a{Xt)dWt , (A.6) 

dYt = b{Yt)dt + a{Yt)dWt, (A.7) 

have a unique weak solution for initial conditions Xq ~ i^o{dx) and Yq ^ /io(dx). Furthermore, we assume 
that the function 


u{x) = a ^(x)(6(x) — a(x)) 


is such that Novikov’s condition E 


;2 So dt < 00 ig satisfied [53]. Under these assumptions we obtain 

explicit formula for the relative entropy rate of the stationary process {Xt}t>o that is the solution of (IA.6I) 
with the initial condition Xq ~ v{dx), where v{dx) is the invariant distribution. 

Lemma A. 6. Let {Xt}t>o and {Yt}t>o be the unique solutions of \A.0\) - {A.T^ with the initial conditions 
Xq ~ r'o(dx) and Yq ~ /io(dx), where vg^dx) = v{dx) is the invariant distribution for the process {Xt}t>o. 
We define u{x) = cr“^(x)(a(x)— 6 (x)). Denoting Q[o,t] and Piq^t] the corresponding path probability measures, 
the relative entropy is 


P-iQlOX] ll^[0 .T]) =EQ[o,j,] 


1 

2 



\u{Xt)\'^ dt 


+ TI{vq\\ p.q) , 


(A.8) 


and the relative entropy rate ’H(Q 11^*) = hmT->.oo (Q[o,t] II -P[o,t]) is 


77(Q||P) = E, 




(A.9) 


where ||&||s-i = Ylt j=i id:)bi{x)bj{x) is the norm on R'’* defined by the diffusion matrix S = (t(x)(t^(x). 

Proof. Under the assumptions on the stochastic differential equations it follows from Girsanov’s Theorem, 
[53|, that Q[o,t] < ^’[o.T] 


'^^[o.T] d,p,Q 




















?^rthermore, Bt = u(Xs) ds + Wt is Brownian motion under Thus we have 


T^{Q[0,T] II^’[0.T]) =EQ[o .j,] 
= 'l^{vo\\^io) +^Q^o,T] 

= 7^(r'o||A^o)+]EQ[o^j,i 
= TZ{vo\\^io)+E q^„t] 


log 


dQ[o,T] 


dP, 


[0,T] 


[ u{Xs)dWs - ^ [ \u{Xs)\‘^ds 

Jo ^ Jo 

[ u{Xs) {dBs - u{Xs) ds) - I- [ \u{Xs)f ds 
Jo ^ Jo 


|u(Xs)p ds 


where in the last identity we use j,] — /g u{Xs)dBs = 0 as Bt is Brownian motion under Q[o^t]- If 
Xq ^ V and thus the process {Xt}t>o is stationary we have 


Eq 


[0,T] 


\u{Xs)\'^ ds 


= TE^ 


i(x)|^ 


from which (IA.9I) follows. 0 

Lemma A.7. Let the drift term a^{x) be parametrized by 9 and assume that the mapping 9 —>■ a®(-) 
is . Let P^qt] ^^6 path (resp. stationary) measure of the associated process. Then, the path 

FIM is 


I«(P®)=E^. [Xea\x)^iaa^)-^ix)X8a\x)] . 


(A.IO) 


Proof. Taylor’s theorem for the drift term a®(-) around 9 reads 

= a^{x) + Vga^{x)e + Ri{9), 

where Xga^{-) is a d x k matrix containing all the first-order partial derivatives of the drift vector (i.e., the 
Jacobian matrix) while the vector Ri{9) is the remainder term of the Taylor’s theorem. Then, the relative 
entropy rate of the path probability measure Pjg with respect to the perturbed path probability measure 

can be written as 


P (P® 11 P®+'=) =-E ^9 |cr i(a;)(a®+'^(a;) - a®(a;))| 


(yga^{x)e + Ri{9\x))^{aa'^) ^{x){yga^{x)e + Ri{9-,x)) 


= — 

2 ^ 

= ie^E^s \yga^{x)'^{aa'^)~^{x)Xga^{x)] e 


1 


-I-e'^E^o [V 0 a®(a;)^(crcr^) '^{x)Ri{9;x)] +-E^e [\a ^{x)Ri{9;x)\'^] 


from which (jA.lOp follows. 
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